38 research outputs found
Incorporating Structured Commonsense Knowledge in Story Completion
The ability to select an appropriate story ending is the first step towards
perfect narrative comprehension. Story ending prediction requires not only the
explicit clues within the context, but also the implicit knowledge (such as
commonsense) to construct a reasonable and consistent story. However, most
previous approaches do not explicitly use background commonsense knowledge. We
present a neural story ending selection model that integrates three types of
information: narrative sequence, sentiment evolution and commonsense knowledge.
Experiments show that our model outperforms state-of-the-art approaches on a
public dataset, ROCStory Cloze Task , and the performance gain from adding the
additional commonsense knowledge is significant
Unlearn What You Want to Forget: Efficient Unlearning for LLMs
Large language models (LLMs) have achieved significant progress from
pre-training on and memorizing a wide range of textual data, however, this
process might suffer from privacy issues and violations of data protection
regulations. As a result, the ability to easily remove data related to
individual users from such models while not deteriorating their predictive
quality after the removal becomes increasingly important. To address these
issues, in this work, we propose an efficient unlearning framework that could
efficiently update LLMs without having to retrain the whole model after data
removals, by introducing lightweight unlearning layers learned with a selective
teacher-student objective into the transformers. In addition, we introduce a
fusion mechanism to effectively combine different unlearning layers that learns
to forget different sets of data to handle a sequence of forgetting operations.
Experiments on classification and generation tasks demonstrate the
effectiveness of our proposed methods compared to the state-of-the-art
baselines.Comment: EMNLP 202
A Cheaper and Better Diffusion Language Model with Soft-Masked Noise
Diffusion models that are based on iterative denoising have been recently
proposed and leveraged in various generation tasks like image generation.
Whereas, as a way inherently built for continuous data, existing diffusion
models still have some limitations in modeling discrete data, e.g., languages.
For example, the generally used Gaussian noise can not handle the discrete
corruption well, and the objectives in continuous spaces fail to be stable for
textual data in the diffusion process especially when the dimension is high. To
alleviate these issues, we introduce a novel diffusion model for language
modeling, Masked-Diffuse LM, with lower training cost and better performances,
inspired by linguistic features in languages. Specifically, we design a
linguistic-informed forward process which adds corruptions to the text through
strategically soft-masking to better noise the textual data. Also, we directly
predict the categorical distribution with cross-entropy loss function in every
diffusion step to connect the continuous space and discrete space in a more
efficient and straightforward way. Through experiments on 5 controlled
generation tasks, we demonstrate that our Masked-Diffuse LM can achieve better
generation quality than the state-of-the-art diffusion models with better
efficiency.Comment: Code is available at
https://github.com/amazon-science/masked-diffusion-l
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models
We consider the problem of eliciting compositional generalization
capabilities in large language models (LLMs) with a novel type of prompting
strategy. Compositional generalization empowers the LLMs to solve problems that
are harder than the ones they have seen (i.e., easy-to-hard generalization),
which is a critical reasoning capability of human-like intelligence. However,
even the current state-of-the-art LLMs still struggle with this form of
reasoning. To bridge this gap, we propose skills-in-context (SKiC) prompting,
which instructs LLMs how to compose basic skills to resolve more complex
problems. We find that it is crucial to demonstrate both the skills and the
compositional examples within the same prompting context. With as few as two
examplars, our SKiC prompting initiates strong synergies between skills and
their composition capabilities. Notably, it empowers LLMs to solve unseen
problems that require innovative skill compositions, achieving near-perfect
generalization on a broad range of challenging compositionality tasks.
Intriguingly, SKiC prompting unlocks the latent potential of LLMs, enabling
them to leverage pre-existing internal skills acquired during earlier
pre-training stages, even when these skills are not explicitly presented in the
prompting context. This results in the capability of LLMs to solve unseen
complex problems by activating and composing internal competencies. With such
prominent features, SKiC prompting is able to achieve state-of-the-art
performance on challenging mathematical reasoning benchmarks (e.g., MATH)